Major changes in the core developmental 
pathways of nematodes: Romanomermis 
culicivorax reveals the derived status of the 
Caenorhabditis elegans model 

\ 

Philipp H. Schiffcr 1 ^, Michael Kroihcr 1 , Christopher Kraus 1 , Georgios D. Koutsovoulos 2 ,Sujai Kumar 2 , 
Julia I. R. Camps 1 , Ndifon A. Nsah 1 , Dominik Stappert 3 , Krystalynne Morris 4 , Peter Hcger 1 , Janinc 
Altmiiller 5 , Peter Frommolt 5 , Peter Niirnberg 5 , W. Kelley Thomas 4 , Mark L. Blaxter 2 and Einhard 
Schierenberg 1 



1-6776-0934 - p.; 



t corresponding author: PHS - ORCiD:0000-0001-6776-0934 - p . schiffer @uni-kocln . de 
1. Zoologisches Institut, Universitat zu Koln, Cologne, NRW, Germany 




2. Institute of Evolutionary Biology, School of Biological Sciences, The University of Edinburgh, Edin- 
burgh, Scotland, UK 

3. Institute fiir Entwicklungsbiologic, Universitat zu Koln, Cologne, NRW, Germany 

4. Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA 

5. Cologne Center for Genomics, Universitat zu Koln, Cologne, NRW, Germany 



1 



Keywords: nematode, genome, Mermithida, development, Caenorhabditis 



Abstract 



Background 

Despite its status as a model organism, the development of Caenorhabditis elegans is not necessarily archetypical 
for nematodes. The phylum Nematoda is divided into the Chromadorea (indcludes C. elegans) and the Enoplea. 
Compared to C. elegans, enoplean nematodes have very different patterns of cell division and determination. 
Embryogenesis of the enoplean Romanomermis culicivorax has been studied in great detail, but the genetic 
circuitry underpinning development in this species is unknown. 



O 



Results 

We created a draft genome of R. culicivorax and compared its developmental gene content with that of two 
nematodes, C. elegans and Trichinella spiralis (another enoplean), and a representative arthropod Tribolium 
castaneum. This genome evidence shows that R. culicivorax retains components of the conserved metazoan 
developmental toolkit lost in C. elegans. T. spiralis has independently lost even more of the toolkit than has C. 
elegans. However, the C. elegans toolkit is not simply depauperate, as many genes essential for embryogenesis 
in C. elegans are unique to this lineage, or have only extremely divergent homologues in R. culicivorax and T. 
spiralis. These data imply fundamental differences in the genetic programmes for early cell specification, inductive 
interactions, vulva formation and sex determination. 



Conclusions 

Thus nematodes, despite their apparent phylum-wide morphological conservatism, have evolved major differences 
in the molecular logic of their development. R. culicivorax serves as a tractable, contrasting model to C. elegans 
for understanding how divergent genomic and thus regulatory backgrounds can generate a conserved phenotype. 
The availability of the draft genome will promote use of R. culicivorax as a research model. 
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Background 

Species in the phylum Nematoda have a generally conserved body plan. The classic nematode form is 
dictated by the presence of a hydroskeleton, where longditudinal muscles act against an inextensible extra- 
cellular cuticle. What is more surprising is the conservation of organ systems between nematode species, 
with, for example, the nervous system and the somatic gonad and vulva having very similar external and 
cellular morphologies. It might be thought that these similar morphologies and cellular structures arise from 
highly stereotypical developmental programmes, but observational data are emerging that challenge this "all 
nematodes are equal" view. The embryonic development of the nematode Caenorhabditis elegans (Rhab- 
ditina, Rhabditda, Chromadorea; see De Ley and Blaxter [l] ) has become a paradigmatic model for studying 
developmental processes in animals, including earliest soma-germline separation, fate specification through 
cell-cell interactions, and differentiation. The particular mode of development of C. elegans is distinct within 
the major metazoan model organisms, but much of the regulatory logic of its development is comparable to 
that in other phyla. One key aspect in which C. elegans differs from vertebrate and arthropod models is that 
C. elegans has a strictly determined developmental programme 2] , with a largely invariant cell lineage giving 
rise to predictable sets of differentiated cells |3 ( . Inductive cell-cell interactions are, nevertheless, essential 
for its correct development (21. The first description of the early embryogenic cell lineage of a nematode, 
that of Ascaris (Spirurina) in the 1880's pip], conforms to the C. elegans model. 

Early development across all three suborders of the Rhabditida (i.e. Rhabditina, Tylenchina and 
Spirurina sensu De Ley and Blaxter [T]) is very similar [6j[7j. In general only relatively minor variations on 
the division pattern observed in C. elegans, including heterochrony in the timing of particular cell divisions, 
and restrictions in cell-cell interaction due to different placement of embryonic blastomeres within the 
eggshell following altered orientations of cell division spindles have been described in these nematodes [8j|9] . 
From this large body of work it might be assumed that all nematodes follow a C. elegans-\\ke pattern 
of development. Deviations from the C. elegans pattern observed in these rhabditid nematodes indicate 
that the determined mode of development is subject to evolutionary change, and have assisted in revealing 
the underpinning regulatory logic of the system. Indeed, a greater role for regulative interactions in early 



development has been characterised in some rhabditids, such as Acrobeloides nanus (Tylenchina) [10 11 . 
Regulative development is common in Metazoa, and is also observed in other ecdysozoan taxa (e.g. within 
the Arthropoda). The determined mode found in C. elegans is thus likely to be derived. Molecular 
and morphological systematics of the phylum Nematoda identify two classes: Chromadorea (including 
Rhabditida), and Enoplea (subdivided into Dorylaimia and Enoplia) [TJ[T2] (Figure 1). In several Enoplea, 



early embryos do not display polarised early divisions, and observational and experimental evidence argues 



against a strongly determined mode of development 13 14]. Strongly determinative development may 
thus be derived even within Nematoda |15| . This implies that the underpinning developmental system in 
Nematoda has changed, while maintaining a very similar organismal output. This phenomenon, termed 



'developmental system drift' 16 , allows independent selection on the mechanism and the final form 
produced by it. To explore mechanistic aspects of development of enoplean and other non-rhabditid 
nematodes requires tractable experimental systems with a wealth of underpinning methodological tools and 
extensive genetic data. While C. elegans and its embryos are relatively easily manipulated and observed, 



and the C. elegans genome has been fully sequenced 17 , embryos from taxa in Enoplia and Dorylaimia are 



much harder to culture and manipulate. Few viable laboratory cultures exist and obtaining large numbers 
of embryos from wild material is difficult. Functional molecular analyses of species in most nematodes, 
and Enoplia and Dorylaimia in particular, is further hindered by the lack of genetic tools such as mutant 
analysis or gene-knockdown via RNAi. 

While realisation of extensive programmes of comparative experimental embryology across the phylum 
Nematoda remains a distant research goal, we have taken a parallel genome-based approach. Using the 
background knowledge of pathways and modules used in other taxa, the underpinning logic of a species' 
developmental system can be inferred from its genome, and the developmental toolkits of different species 
can be compared. These comparisons can pinpoint changes in developmental logic between taxa by 
identifying genes unique to one species or group, and gene losses during evolution, that must result in 
changed pathway functioning. Efficient generation of genomic resources for non-model species, and the 
inference of developmental regulatory pathways from the encoded gene sets, is now possible. The majority of 
the 11 genome sequences determined to date for Nematoda has been from Rhabditida (e.g. C. elegans and 
congeners) [18-25 . A single member of Enoplea, the mammalian parasite Trichinella spiralis (Dorylaimia; 
order Trichocephalida) has been sequenced (26j . T. spiralis is ovoviviparous, and proper development 



requires the intrauterine environment. T. spiralis blastomeres are extremely transparent 27 such that 
individual nuclei are hard to identify (E.S., unpublished observations). Hence this species is of very limited 
value for image analysis and experimental investigations correlating cellular aspects and the underpinning 
molecular logic of early development. The genomes of many additional nematode species are being sequenced 
and annotated 28f29 , but even in this wider sampling of the phylum, Enoplia and Dorylaimia are neglected. 



Romanomermis culicivorax (order Mcrmithida within Dorylaimia), has been established in culture for 
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decades. R. culicivorax infects and kills the larvae of many different mosquito species [30] , and is the subject 
of research programmes investigating its potential as a biocontrol agent of malaria and other disease vectors 



30 31 . R. culicivorax and T. spiralis differ fundamentally in many life-cycle and phenotypic characters. 



Free living R. culicivorax juveniles actively seek and invade mosquito larvae in the water [32], while T. 



spiralis is transmitted as an arrested, first stage larva encysted in muscle tissue 33 . R. culicivorax embryos 
are easily studied under laboratory conditions, and a single female can produce more than a thousand eggs 
in culture. They display a developmental pattern that differs markedly from the C. elegans model. As in 



other Dorylaimia and Enoplea 14 34 the first division is equal, and not asymmetric as in C. elegans. R. 
culicivorax also shows an inversion of dorso-ventral axis polarity compared to C. elegans. A predominantly 
monoclonal fate distribution in R. culicivorax somatic founder cells indicates fewer modifying inductions 



between blastomeres 34 35 . Generation of the hypodermis involves repetitive cell elements extending from 
posterior to anterior over the remainder of the embryo, a system very different from that of C. elegans 35 . In 
the context of this distinct developmental mode in R. culicivorax, we decided to catalogue its developmental 
toolkit by sequencing the genome, and here present a draft assembly and annotation. We contrast the 
toolkits identified in R. culicivorax and T. spiralis with that of C. elegans, and of other metazoa, notably 
the arthropod Tribolium castaneum. We conclude that major changes in the regulatory logic of development 
have occurred during the evolution of nematodes, possibly as a consequence of developmental system drift, 
and that the model species C. elegans represents an extreme derivation from a shared metazoan ground 
system. 



ind Discussion 



Results ar 

Romanomermis culicivorax has a large and repetitive genome 

A draft genome assembly for the mermithid nematode R. culicivorax was generated from 26.9 gigabases 
(Gb) of filtered raw data (from a total of 41 Gb sequenced; Table 1). The assembly has a contig span of 
267 million base pairs (Mb) and a scaffold span of 323 Mb. The 52 Mb of spanned gaps are likely inflated 
estimates derived from our use of the SSPACE scaffolder. We do not currently have a validated independent 
estimate of genome size for R. culicivorax, but preliminary measurements with Feulgen densitometry suggest 
a size greater than 320 Mb (Elizabeth Martinez Salazar pers. comm.). The R. culicivorax genome is thus 
likely to be three fold bigger than that of C. elegans, and five fold that of T. spiralis (Table 2). The assembly 
is currently in 62,537 scaffolds and contigs larger than 500 bp, with an N50 of 17.6 kb. The N50 for scaffolds 
larger than 10 kb is 29.9 kb, and the largest scaffold is over 200 kb. The GC content is 36.3%, comparable 
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to 38% of C. elegans and 34% in T. spiralis. We identified 47% of the R. culicivorax genome as repetitive. 
To validate this estimate we repeated our repeat-finding approach against previously published genomes and 
achieved good accordance with published data (Table 2). The non-repetitive content of the R. culicivorax 
genome is thus approximately twice that of C. elegans and three times that of T. spiralis. T. spiralis thus 
stands out as having the least complex nematode genome sequenced thus far, and the contrast with R. 
culicivorax shows that small genomes are not a characteristic of Dorylaimia. 

The RNA-Seq data were assembled into 29,095 isotigs in 22,418 isogroups spanning 23 Mb, and thus are likely 



to be a reasonable estimate of the R. culicivorax transcriptome. Using BLAT 36 , 21,204 of the isotigs were 
found to be present (with matches covering >80% of the isotig) in single contigs or scaffolds of the genome 
assembly, suggesting reasonable biological completeness and contiguity. We also used the CEGMA approach 
to assess quality of the genome assembly, and found high representation (89.92% partial, 75.40% complete) 
and low proportion of duplicates (1.05 fold), suggesting a high quality assembly with limited retained haploid 
assembly duplicates (Table 1). Automated gene prediction from the assembly with iterative rounds of the 
MAKER pipeline, using the RNA-Seq data as evidence both directly and through GenomeThreader-derived 
mapping, yielded a total of nearly 50,000 gene models. These were reduced to 48,171 gene models by merging 
those with identities >99% using Cd-hit. This gene count would be surprisingly high for a nematode: C. 
elegans has ^22,000 genes, T. spiralis has ~16,000, and Pristionchus pacificus has ^27,000. The excess 
of R. culicivorax gene models may result from poorly assembled contigs, from assembly fragmentation, 
and "over-enthusiastic" prediction from gene modelers within the MAKER pipeline. Within the 48,171 
predictions, 12,026 were derived from the Augustus modeler and 36,145 from SNAP. Because Augustus 
predictions conservatively require some external evidence (transcript mapping and/or sequence similarity 
to other known proteins), we regarded these as the most reliable and biologically complete. Exons of the 
Augustus-predicted genes in R. culicivorax had a median length of 161 bp, slightly larger than those in C. 
elegans (137bp) and T. spiralis (128bp). Introns of the R. culicivorax Augustus models, with a median of 
405 bp, were much larger than those in C. elegans (69 bp) or T. spiralis (283bp). The small introns observed 
in C. elegans and other rhabditid nematodes (Table 2) are thus likely to be a derived feature. 



We annotated 1,443 tRNAs in the R. culicivorax genome using INFERNAL [37] and tRNAscan-SE 38 , of 
which 382 were pseudogenes (see Table S5 for details). In comparison, T. spiralis has 134 tRNAs of which 
7 are pseudogenes, while C. elegans has 606 tRNAs with 36 pseudogenes [39] . Threonine (Thr) tRNAs were 
particularly overrepresented (676 copies), a finding echoed in the genomes of Meloidogyne incognita and 



Meloidogyne floridensis (tylenchine nematodes, see Figure 1) 24 and in P. pacificus [20]. P. pacificus also 



has an overrepresentation of Arginine tRNAs |39| . 

We have made the annotated R. culicivorax genome, with functional categorisations of predicted genes 



and proteins and annotation features, available in a dedicated genome browser at http://romanomermis. 
nemat od.es. I 

The R. culicivorax proteome retains conserved metazoan components lost in T. spiralis and C.elegans 

The phylogenetic placement of R. culicivorax compared to C. elegans makes its genome ideal for exploring 
the likely genetic complexity of the ancestral nematode. With T. spiralis, it can be used to reveal the 
idiosyncracies of the several genomes available for Rhabditida. To polarise this comparison, we used 



data from the genome of the arthropod T. castaneum. The T. castaneum genome is of high quality 40 



and the pattern of development of this beetle is less derived than that of the major arthropod model 



Drosophila melanogaster 41 . We used the orthoMCL pipeline to generate a set of gene clusters for the 
four species R. culicivorax, T. spiralis, C. elegans and T. castaneum. The large sequence divergence 
between the four species may have obscured orthology relationships, making inference of true functional 



orthology problematic 42-44 , but the parameters used (a BLAST E- value of le , and orthoMCL inflation 



parameter of 1.5) can be regarded as relaxed (i.e. most inclusive) compared to other studies 44 -46 . As 
the R. culicivorax genome assembly may not be complete, we based inference of absence on shared loss in 
both R. culicivorax and T. spiralis. Thus, we believe that our analyses were at a minimum able to identify 
homologues where present, and thus we could robustly infer absence. While the orthoMCL pipeline is 
regarded as very robust in accurately clustering unknown proteins |47| inferences of functional or biological 
orthology are complex. Inferences of absence were explored in detail (Supplementary file 5). 

We identified 3274 clusters that contained protein representatives from all three nematode genomes, 
and 2833 of these also contained at least one T. castaneum representative (Figure 2). These 2833 clusters 
represent a conserved metazoan and eukaryotic core proteome. There were many clusters that contained 
proteins from only one species of nematode, representing lineage specific expansions of novel protein 
families. T. spiralis had the lowest number of these (975), while C. elegans and R. culicivorax each had 
over two thousand. Interestingly, of the 2747 R. culicivorax-limited clusters, 324 (11.8%) had apparent 
orthologues in T. castaneum. Such clusters are candidates for retention of phylogenetically ancient genes by 
one nematode species and loss in the other two. 

T. spiralis appeared to have lost more phylogenetically ancient genes than had either R. culicivorax or C. 
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elegans. T. spiralis and C. elegans shared only 412 clusters exclusive of R. culicivorax members, while R. 
culicivorax and C. elegans shared 1298 clusters exclusive of T. spiralis. Despite their phylogenetic affinity, 
R. culicivorax and T. spiralis only shared 600 clusters exclusive of C. elegans. C. elegans and R. culicivorax 
shared very similar numbers of clusters with T. castaneum (2833 contain all species in the comparison; 853 
contain only C. elegans, R. culicivorax and T. castaneum, 569 C. elegans and T. castaneum, and 568 R. 
culicivorax and T. castaneum) (Figure 2). 



The clusters containing only R. culicivorax and T. spiralis might identify functions important to these 
dorylaim nematodes. In the 461 T. spiralis and 806 R. culicivorax proteins in these clusters, a total of 
65 GO terms were found to be overrepresented (p<0.05 by Fisher's exact test) compared with the GO 
annotation set derived from the complete C. elegans proteome, and 33 were overrepresented when compared 
to annotation of the T. castaneum genome. There were 26 GO terms overrepresented in both comparisons. 
Clusters with R. culicivorax, T. spiralis and T. castaneum members (but lacking C. elegans members) 
contained 332 R. culicivorax and 573 T. spiralis and 445 T. castaneum proteins, and we identified 40 
GO terms overrepresented compared to the GO annotated C. elegans proteome (see Supplementary file 
2). From this we suggest that T. spiralis may not have a typical dorylaim genome. The T. spiralis 
genome is reduced in content compared to other nematodes: it is smaller, has fewer genes overall, and 
has fewer phylogenetically ancient genes. This is congruent with the previously reported loss of proteins 



with metabolic function in T. spiralis 26 . The evolutionary reasons behind this reduction remain obscure, 
but could include loss of genetic capacity following acquisition of a unique lifestyle that lacks a freeliving 
stage or genomic streamlining to permit rapid reproduction and growth. Many parasitic and endosymbiotic 



prokartyotes and eukaryotes have reduced genome sizes 48 



The genetic background of development in R. culicivorax and T. spiralis differs markedly from that of 
C. elegans 

In a recent multi-species developmental timecourse expression analysis within the genus Caenorhabditis, 
conserved sets of genes were found to be over-expressed in discrete portions of the developmental timeline 
from zygote to hatching larva [49) . In particular, this study suggests a conserved period in development 
where a very restricted set of genes is expressed in all species, perhaps corresponding to a 'bauplan' stage 
in nematode development as has been proposed for Metazoa in general. To explore whether this model 
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can be extended across Nematoda, we identified R. culicivorax and T. spiralis homologues of the 1725 



developmentally regulated C. elegans genes extracted from this analysis 49 . Nearly half (845) of these 
genes were not grouped in clusters with Dorylaimia proteins using orthoMCL. We were unable to identify 
any sequence homologs for 450 of the proteins in R. culicivorax using BLAST+. 

The remaining 395 proteins had BLAST+ hits to R. culicivorax proteins, but were so divergent that 
orthoMCL did not cluster them as orthologs with Dorylaimia proteins. Among these 395 with marginal 
matches, we found that 18 belonged to the C. elegans nuclear hormone receptor subfamilies, 5 were innexin 
type gap-junction protein, 6 were TWiK potassium channel proteins and 5 were acetylcholine receptor 



proteins. These protein families are particularly diverse and expanded in C. elegans 50 - 53 and we 
suggest that the genes "missing" from R. culicivorax but having low-scoring BLAST+ matches represent 
rapidly evolved, divergent duplications within the lineage leading to C. elegans. OrthoMCL is likely to be 
correct in not clustering most of these proteins. The proportion of Caenorhabditis-restricted genes across 
the developmental timecourse examined by Levin et al. [49J varied from 36.4% to 59.9% (Figure 3 and 
Supplementary file 4). A surprisingly high proportion of the developmental genes acting during specific 
embryonic stage transitions appear to be unique to the genus Caenorhabditis or at least so divergent 
that functional orthology, including interaction with conserved partners, is doubtful. A striking difference 
between R. culicivorax and T. spiralis was apparent, with 238 of the developmentally differentially expressed 
C. elegans genes having a R. culicivorax homologue but not a T. spiralis homologue, while only 88 had 
a T. spiralis homologue but not an R. culicivorax one. Given the conservatism of body plan evolution in 
nematodes, these dramatic genetic differences suggest extensive, largely phenotypically "silent" changes 
in the genetic programmes orchestrating nematode development. We used computational comparisons of 
selected key molecular processes and pathways to tease out the differences between the model C. elegans 
and the two dorylaim species, T. spiralis and R. culicivorax. 



Core developmental pathways differ between nematodes 

There are important differences in the cellular biology of development between R. culicivorax and C. elegans 



34 35 , and we used the genomic data to follow up on some of the more striking contrasts between the 



dorylaim and the rhabditid patterns of development: primary axis polarity, segregation of maternal message 
within the early embryo, hypodermis formation, the vulval specification pathways, epigenetic pathways 
(especially DNA methylation), sex determination and light sensing. 
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In the C. elegans 2-cell stage mitotic spindles rotate 90% in the posterior germline cell, and the subsequent cell 



divisions are orthogonal 54-56 . This rotation is not observed in R. culicivorax and division is longditudinal 



34 . In C. elegans and many other animals par genes are essential for cell polarisation |57 and polarised 



distribution of PAR proteins results in the restriction of mitotic spindle rotation to one cell. C. elegans 
mutants lacking par-2 and par-3 genes resemble the R. culicivorax phenotype, showing longitudinal spindle 
orientation 



58 . The par-2 gene was missing from both R. culicivorax and T. spiralis (Figure 3; Table 



3). Additionally, no orthologues for the par-<?-interacting genes let-99, gpr-1 or gpr-2, required for proper 
embryonic spindle orientation in C. elegans |59| , were identified in the dorylaims using orthoMCL clustering 
or sensitive BLAST searches. We identified a candidate par-3 in R. culicivorax, but this was so divergent 
from C. elegans, T. castaneum and T. spiralis par-3 that these putative orthologues were not clustered in 
our analysis. The D. melanogaster par-3 ortholog bazooka functions in anterior-posterior axis formation, 
but as in R. culicivorax and T. spiralis par-2 is absent from the fly [60] . Thus, we hypothesise that the 
PAR-3 - PAR-2 system for regulating spindle positioning evolved within in the lineage leading to the genus 
Caenorhabditis. The divergent par-3-\ike gene in dorylaims may be involved in axis formation, but perhaps 
interacts with different partner proteins. 

Once polarity has been established in the early C. elegans embryo, many maternal messages are differentially 



segregated into anterior or posterior blastomeres 56 61 . MEX-3 is an RNA-binding protein translated from 



maternally-provisioned mRNAs found predominantly in early anterior blastomeres 62 63 . We identified a 
highly divergent MEX-3 homologue in R. culicivorax, but found no orthologue in T. spiralis. 

To demonstrate the utility of the R. culcivorax system, and the power of the genome-to-development 
model, we assayed its expression in embryos using in situ hybridisation. We selected the mex-3 gene for 
these studies, as it is strongly expressed and highly localised during a short time window in development 
in C. elegans. The observed expression pattern in R. culicivorax is similar to C. elegans (Figure 5). In 
the fertilized R. culicivorax egg mex-3 mRNA is initially equally distributed. Prior to first cleavage mex-3 
mRNA is segregated to the anterior pole and thus becomes essentially restricted to the somatic SI blastomere 
(for nomenclature, see 14|). With the division of SI it is localised to both daughter cells. After the 4-cell 
stage the signal disappears gradually. Despite the presence, and apparent conservation of expression pattern, 
of mex-3, we were unable to identify other components of the C. elegans maternal mRNA regulation system, 
such as mex-5, mex-6 and spn~4 in either dorylaim species. While MEX-5 and MEX-6 are important for 



controlled MEX-3 expression in C. elegans 64 , the apparent absence of SPN-4 in R. culicivorax and T. 
spiralis is particularly intriguing. SPN-4 links embryonic polarity conferred by the par genes and partners 
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to cell fate specification through maternally deposited mRNAs and proteins 65 66 . This suggests that the 
core regulatory logic of the early control of axis formation and cell fate specification must differ significantly 
between the dorylaim species and C. elegans. 

The hypodermis in C. elegans is derived from specific descendants of the anterior SI (AB) and the posterior S3 



(C) founder cells 67 . In contrast, in R. culicivorax the hypodermis is derived from S2 (EMS) descendants, 



which form repetitive ring structures that extend from posterior to anterior 35 . Several developmental 
regulatory genes expressed in the hypodermis or associated with hypodermal development were present only 
in C. elegans in our analysis (see Table 3 and Supplementary file 4). The GATA-like transcription factor gene 
elt-3 gene is absent in the dorylaim species, but the elt-3 ortholog elt-1 is conserved in R. culicivorax, T. 



spiralis and T. castaneum. These genes act redundantly in C. elegans hypodermis formation 68 . Thus elt-3 
involvement must be an innovation in the rhabditid lineage, suggesting changes of interaction complexity 
during nematode evolution. 

In C. elegans, vulva formation is highly dependent on the inital cell-cell interactions of the anchor cell with 
the neighboring vulva precursor cells (VPCs). Induction of the VPCs activates a complex gene regulatory 
network which drives divisions and differentiations of the VPCs to form a functional vulva. The evolutionary 
lability of this system has been explored in rhabditid nematodes, revealing the changing relative importances 
of a series of cell-cell interactions, short and long range inductions, and lineage-autonomous specifications 
69]pf0 . The signal transduction pathways involved include a RTK/RAS/MAPK cascade, activated by EGF- 



and wnt-signaling 71 . Among the downstream targets in C. elegans are for example lin-1 and the /3-catenin 



bar-\, which in turn regulates the HOX-5 ortholog lin-39 72-74 . Our analysis shows that lin-1 and bar-\, 
as well as other important regulators of vulva development, are absent from the genomes of R. culicivorax 
and/or T. spiralis (Table 3 and Supplementary File 4). We identified a R. culicivorax gene with a low-quality 
match to BAR-1 (24.2% sequence identity). This protein is not clustered with other dorylaim proteins, and 
appears to be either a duplication of the /3-catenin ortholog HMP-2 or another armadillo repeat-containing 
protein and not orthologous to bar-1 (see Supplementary file 5). These shared patterns of absence again 
indicate that the same morphological structures can be generated with very different genetic underpinnings. 
While it is possible that vulva formation in the dorylaims is regulated without the bar-1 - lin-39 interactions, 



as observed in P. pacificus 75 , it may be that HOX genes function differently in the dorylaims: rather than 



acting in a lineage-dependent manner (as in C. elegans [76 77 ) they may act in a positional regulatory 



manner, as in other animals 78 79 



Epigenetic regulation is key to developmental processes in many animals, but its roles in C. elegans are more 
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muted. While C. elegans has a reduced ability to methylate DNA [80], orthologue clusters restricted to R. 
culicivorax and T. spiralis (excluding C. elegans) were enriched for four methylation-associated GO terms. 
We also found significant enrichment (p<0.05) for GO terms describing chromatin and DNA methylation 
functions in the set of R. culicivorax proteins that lacked homologues in C. elegans (see Supplementary file 
2). Important roles for methylation and changes in methylation patterns in the development of T. spiralis 



have been inferred from transcriptional profiling 81 . In addition, methylation is important for the silencing 



of transposablc elements 82,83 and could play a crucial role in the highly repetitive R. culicivorax genome. 
The C. elegans genome was also found to be depleted for chromatin re-modeling genes of the Polycomb 
and Trithorax groups [84] . It is intriguing that we found orthologs of T. castaneum pleiohomeotic in R. 
culicivorax and T. spiralis, and orthologs of T. castaneum trithorax and Sex comb on midleg (5cm) in R. 
culicivorax. This suggests that dorylaim chromatin restructuring mechanisms may be much more arthropod- 
like than are those of C. elegans. The presence of an intact methylation machinery and conserved chromatin 
re-modelling factors opens the prospects for a role for epigenetic modification in developmental regulation 
in dorylaim nematodes. 

Sex determination machinery 

The mechanism of sex determination differs considerably among animals and it has been claimed to be one of 
the developmental programs most influenced by developmental system drift |16| . Sex ratios in R. culicivorax 



are described to be environmentally determined through in-host nematode density 85 , and thus might be 



fundamentally different from the system found and extensively analysed in C. elegans 86 . Environmental 
sex determination is found in many nematode taxa, including Strongyloididae and Meloidogyninae (both 
Tylenchina), taxa more closely related to C. elegans. C. elegans sex determination is based on the X to 
autosome ratio, with males haploid for the X chromosome (XO), and females diploid (XX). This difference 
is read by the master switch xol-1 [87], which acts through the three sdc genes [88}|90] to regulate the 
systemic secretion of HER-1, a ligand for the TRA-2 receptor [91-93 . TRA-2 in turn negatively regulates a 
complex of fern genes, which regulates nuclear translocation of TRA-1, the final shared step in the pathway 
that switches between male and female systems. We did not find credible homologues (through orthoMCL 
and re-confirmation with BLAST) of xol-1, sdc-1, sdc-2, sdc-3, her-1 or tra-2 in either T. spiralis or R. 
culicivorax (Table 3; Supplementary file 5), and thus these species are unlikely to use the HER-l/TRA-2 
ligand-receptor system to coordinate organism-wide sexual differentiation. 
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Light sensing machinery in R. culicivorax 



Light sensing with and without eye-like organs has been described in other mermithids 94 95 . Although R. 
culicivorax has no structurally evident eye spots it is likely that invasion of the mosquito host on the surface 
of the water body [32] and migration of emerged nematodes migrate back to the substrate to mate and 
deposit eggs [96] involves phototactic behaviour. Preliminary experiments with R. culicivorax give support 
to this view (J. Burr, pcrs. comm.), but the underlying physiology has not been explored. We identified 
several GO terms associated with photoreceptor development and light sensing (see Supplementary file 2) in 
R. culicivorax proteins in comparison to C. elegans and T. castaneum proteomes (in the set of R. culicivorax 
proteins without homologues in these species). Two especially intriguing GO terms were 'phototaxis' and 
'energy taxis'. Proteins associated with these GO terms had BLAST similarities to COUP transcription 



factors, which in the mouse have been associated with cell fate determination in the eye 97 



In Mermis nigrescens, a close relative of R. culicivorax, a directional light sensing organ is found in the 
anterior pharynx, where a cylinder of light-shadowing cells packed with a nematode hemoglobin shades a 



central photoreceptor 94 98 99 . Globin-likc domains are found in diverse gene families in Nematoda [100] . 
More R. culcivorax proteins were annotated with the GO term 'oxygen binding' than those of the other 
species analysed (Supplementary figure 1). Several of these R. culicivorax proteins have BLAST matches to 
bonafide globins and hemoglobins, and an optical shadowing function is possible for one or more of them. 
Pigment granules are segregated into the hypodermis of R. culicivorax (see Figure 5) and may also have a 



light-shadowing function 34 . We also found the GO term 'cellular pigment accumulation' in the set of R. 
culicivorax proteins that had homologues with T. spiralis and T. castaneum, but not with C. elegans. The 
protein associated with this GO term was most similar to Xenopus SHROOM2 protein, which is involved 



in melanosome formation and expressed in the eye of the frog 101 . We also identified a candidate opsin 
in R. culicivorax. The gene is partially supported by EST data, and could generate a 313 amino acid 
protein with identities of 26% to the Bos taurus (accession NP_776991) and Didelphis aurita (ABC75817) 
long- wave-sensitive opsins. 



Conclusions 

By combining the R. culicivorax genome presented here together with the published T. spiralis genome, 
we have been able to explore the molecular diversity of of Dorylaimia, and provide robust contrasts with 
the intensively studied Rhabditida. Particularly surprising were the differences between R. culicivorax and 
T. spiralis. The R. culicivorax genome is much larger than that of T. spiralis. A majority of the genome 
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was identified as repetitive, including many transposable elements. Despite the phylogenetic and lifestyle 
affinities between the two dorylaims compared to C. elegans, the R. culicivorax genome retained many more 
genes in common with C. elegans than did T. spiralis. We suggest that T. spiralis may be an atypical 
representative of dorylaim nematodes, perhaps due to a highly derived life cycle. 

Our analyses identified many genes apparently absent from the dorylaim genomes. We used very relaxed 
anaysis parameters, and performed close analyses of genes identified as critical in C. elegans development 
for which we could find no credible dorylaim orthologues. In these phylum-spanning comparisons, inferences 
of gene orthology can be obscured by levels of divergence. In addition, the gene family birth rate in the 



chromadorean lineage leading to C. elegans is high 25 26 , and therefore C. elegans was expected to have 
many genes absent from the dorylaim species. Thus, we might not have found a R. culicivorax orthologue 
for a specific gene for three reasons: it may have arisen in the branch leading to C. elegans; its sequence 
divergence may be too great to permit clustering with potential homologs; or it was not assembled in the 
draft dorylaim genomes. The analyses of C. elegans PAR-3 and D. melanog aster bazooka illustrate some of 
these difficulties: the possible R. culicivorax orthologue was highly divergent. Whether or not we have been 
able to identify all the orthologues of the key C. elegans genes present in the R. culicivorax and T. spiralis 
genomes, the absence of an identified orthologue maximally implies loss from the genome, and minimally 
implies significant sequence, and thus functional, divergence. 

Between the model organisms C. elegans and D. melanogaster many key mechanisms governing early cell 



patterning are divergent 54 . Our data indicate that a major divergence also exists within Nematoda. 
T. spiralis and R. culicivorax share a lack of orthologues of genes involved in several core developmental 
processes in C. elegans, and many of these C. elegans genes are restricted to the Rhabditida. It is thus 
doubtful that these processes are regulated by same molecular interactions across the phylum. To the 
contrary it is likely that developmental system drift has played (and still plays) a major role in nematode 



evolution. The phenotypic conservatism associated with the vermiform morphology of nematodes 102 has 
fostered unjustified expectations concerning the genetic programmes that determine these morphologies. 
To be useful as a contrasting system to the 'canonical' C. elegans model, any nematode species must be 
accessible to both descriptive and manipulative investigation. Here, we have defined a reference genome for 
R. culicivorax, laying bare the core machinery available for developmental regulation, and demonstrated that 
in situ hybridisation approaches are feasible for this species. Along with the robust laboratory cultures long 
established, this makes R. culicivorax an attractive and tractable alternative model for understanding the 
evolutionary dynamics of nematode developmental biology. We have highlighted a few of the possible avenues 
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a research programme could follow: early axis formation and polarisation, the specification of hypodermis, 
sex determination, vulva formation and the roles of epigenetic processes in developmental regulation. The 
advent of robust, affordable and rapid genome sequencing also opens the vista of large-scale comparative 
genomics of development across the phylum Nematoda |28| to better understand the diversity of the phylum 
and also place the remarkable C. elegans model in context of its peers. It will next be necessary to extend 
these analyses to a broader sampling of developmental pathway genes from a wider and fully representative 
sampling of nematode genomes across the full diversity of the phylum. 



Methods 

Sequencing and Genome Assembly 

Genomic DNA was extracted from several hundred, mixed-sex, adult R. culicivorax specimens from a cul- 
ture first established in Ed Platzer's laboratory in Riverside, California. Illumina paired end and mate pair 
sequencing with libraries of varying insert sizes, and Roche 454 single end sequencing, was performed at the 
Cologne Center for Genomics - CCG ( |http:/ /www.ccg .uni-koe ln.de[ ). A Roche 454 dataset of transcriptome 
reads from cDNA synthesised from mixed developmental stages and sexes was also generated (see Table SI 
for details of data generation). 

The quality of the raw data was assessed with FastQC (v. 0.9) (http://www.bioinformatics.babraham.ac. 



uk/projects/fastqc/). Adapter sequences and low quality data were trimmed from the Illumina paired end 



^ X . . 

data with custom scripts (see http://github.com/sujaikumar/assemblage) and from the mate pair libraries 



with Cutadapt (v. 1.0) [103] . We constructed a preliminary genome assembly, with relaxed insert size pa- 
rameters, from the paired end Illumina libraries with the de-novo-assemble option of the clcAssemblyCell 
(v. 4. 03b) [104) . We validated the actual insert sizes of our libraries by mapping back the reads to this pre- 
liminary assembly using clcAssemblyCell. The preliminary assembly was also used to screen out bacterial 



and other contaminant data 105 . The transcriptome data were assembled with Roche GSAssembler (New- 
bler; version 2.5). For the production assembly, we explored assembly parameters using different mixes of 
our data, evaluating each for total span, maximal contig lengths, N50, number of contigs, representation of 



the transcriptome, and conserved eukaryotic gene content (using the CEGMA pipeline in version 2.1 106 ). 
The most promising assembly was scaffolded with the filtered Illumina mate pair read sets using SSPACE 
(v. 1.2) [107|. As our genomic DNA derived from a population of nematodes of unknown genetic diversity, we 



removed short contigs that mapped entirely within larger ones using Cd-hit (v. 4. 5. 7) [108] at a 95% cutoff. 
A final round of superscaffolding was performed, linking scaffolds that had logically consistent matches to 
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the transcriptome data based on BLAT [36] hits and processed with SCUBAT (B. Elsworth, pers. comm.; 
|http: / / github.com/elswob/ SCUBAT ) . The final genome assembly was again assessed for completeness by 



assessing the mapping of the transcriptome contigs and with the CEGMA pipeline 106 



Genome Annotation 

RepeatMasker (v.3.3.0) [l09)[ll0], RepeatFinder [m] and RepeatModeler (v.1.0.5) ( |http://wwwT 



repeatmasker.org/ RepeatModel er.html| combining RECON (v.1.07) [112| and RepeatScout (v. 1.05) [113] ) , 



were used to identify known and novel repetitive elements in the R. culicivorax genome. We employed the 



MAKER pipeline 114 to find genes in the R. culicivorax genome assembly. In a first pass, the SNAP gene 
predictor included in MAKER was trained with a CEGMA [ 106] derived output of predicted highly conserved 
genes. As additional evidence we included the transcriptome assembly and a set of approximately 15,000 
conserved nematode proteins derived from the NEMBASE4 database |115| (recalculated by J. Parkinson; 
pers. comm.). In the second, definitive, pass we used the gene set derived from this first MAKER iteration 



to train Augustus 116 inside the MAKER pipeline for a second run, also including evidence from transcrip- 



tome to genome mapping obtained with GcnomcThreader 117 . Codon usage in R. culicivorax, T. spiralis 



and C. elegans was calculated using INC A (v2.1) [118] . Results were then compared to data from |1 19] (see 
Supplementary files 1 and 3). 

We used Blast2GO (Blast2G04Pipe, v. 2. 5, January 2012 database issue) [120] to annotate the gene set 
with Gene Ontology terms 121 , based on BLAST matches with expect values less than le -5 to the 



UniProt/SwissProt database (March 2012 snapshot), and domain annotations derived from the InterPro 
database [122] . Comparison of annotations between three nematode species (R. culicivorax, C. elegans 
and T. spiralis) and, as a reference outgroup, the holometabolous coleopteran arthropod Tribolium casta- 
neum was based on GO Slim data retrieved with Blast2GO. RNA genes were predicted using INFERNAL 



(v.1.0.2) Wf] and the Rfam database 123 , and tRNAscan-SE (v.1.3.1) pi 



Orthology Screen 

We inferred clusters of orthologous proteins between R. culicivorax, T. spiralis and C. elegans, and the 
beetle T. castaneum using OrthoMCL (v. 2. 0.3) [124] . T. spiralis, C. elegans and T. castaneum protein 
sets were downloaded from NCBI and WormBase (see Table S2) and redundancy screened with Cd-hit at 
the 99% threshold. We selected an inflation parameter of 1.5 for MCL clustering (based on 125 126] ) 



within OrthoMCL to generate an inclusive clusterings in our analysis likely to contain even highly diverged 
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representatives from the four species. In analyses of selected developmental genes, clusters were manually 



validated using NCBI-BLAST+ 127 . We affirmed the uniqueness of C. elegans proteins identified as 
lacking homologucs in the enoplean nematodes by comparing them to the R. culicivorax proteome using 
BLAST. Those with no significant matches at all (all matches with E- values > le -5 ) were classified as 
confirmed absent. Those having matches with E- values < le -5 were investigated further by surveying the 
cluster memberships of the R. culicivorax matches. If the R. culicivorax protein was found to cluster with 
a different C. elegans protein, the uniqueness to C. elegans was again confirmed. If the R. culicivorax 
protein did not cluster with an alternative C. elegans protein, we reviewed the BLAST statistics (E-value, 
identity and sequence coverage) of the match and searched the GenBank non redundant protein database 
for additional evidence of possible orthology. Only if these tests yielded no indication of direct orthology 
was the C. elegans protein designated absent from the enoplean set. Further details of the process are given 
in Supplementary file 5. 

We identified the protein sequences of 1,725 genes differentially expressed in C. elegans developmental stages 



49 and selected, using our OrthoMCL clustering, those apparently lacking orthologues in R. culicivorax and 



T. spiralis (verified as above). Using Wormbase (http://www.wormbase.org release WS233) we surveyed 



the C. elegans-iestricted genes for their experimentally-defined roles in development. 

Custom Perl scripts were used to group orthoMCL clusters on the basis of species membership patterns. 
The sets of clusters that contained (i) both T. spiralis and R. culicivorax members but no C. elegans 
members and (ii) T. spiralis and R. culicivorax and T. castaneum members but no C. elegans members 
were surveyed for GO annotations enriched in comparison to the whole C. elegans proteome (sets i and ii) 
and the T. castaneum proteome (set i), conducting Fisher's exact test as implemented in Blast2G0. To 
improve annotation reliability, these proteins were recompared (using BLAST) to the UniProt/SwissProt 
database and run through the Blast2G0 pipeline in the same way as described above. 

Whole-mount in situ hybridization 



For in situ hybridisation we modified the freeze-crack procedure described previously for C. elegans 128 



and revised by Maduro et al. (2007; [http://www.faculty.ucr.edu/~mmaduro/resources.htm ). In particular 



to allow for reliable penetration of the durable R. culicivorax egg envelopes we initially partly removed the 
protective layer by incubation in alkaline bleach solution (see [34| ) . Digoxygenine-labeled sense and antisense 
RNA probes were generated from linearized pBs vectors (Stratagene, La Jolla, USA) containing a 400 bp 
fragment of R. culicivorax vnex-3 via run off in vitro transcription with T7 or T3 RNA-polymerase according 
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to the manufacturer's protocol (Roche, Mannheim, Germany). The concentration of the labeled probes was 
about 300 ng x ml -1 . 
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Figure 1: A simplified phylogenetic tree of the phylum Nematoda. The phylogeny, simplified 
from [T 12 , emphasises the position of the main study species R. culicivorax, T. spiralis and C. elegans. 
The phylogenetic placements of species from Table 2 are given in grey. Currently no genomic data are 
available for Enoplia (Clade II). The order of branching of the basal nodes of Nematoda is unresolved. 
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Figure 2 




X s 

Figure 2: Clusters of homologous proteins. Shared and species-unique clusters of homologous pro- 
teins from a comparison of the proteomes of Romanomermis culicivorax, Trichinella spiralis, Caenorhabditis 
elegans and Tribolium castaneum using OrthoMCE\ 
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Figure 3 




Figure 3: The network of proteins interacting with PAR-2 and PAR-3 in Caenorhabditis 
elegans and their orthologues in Romanomermis culicivorax and Trichinella spiralis. The 

network cartoon is based on the core polarity pathway extracted from WormBase, derived from both genetic 
and physical interactions. PAR-2 was missing from the dorylaim nematodes, as were the directly connected 
mes-3 and mes~4 genes. The R. culicivorax PAR-3-like proteins was not retrieved as an orthologue of C. 
elegans and T. spiralis PAR-3 proteins, but was identified employing sensitive sequence similarity search. 
See Table 3 for additional proteins interacting with PAR proteins and their presence-absence patterns. 
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Figure 4 
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Figure 4: Many genes that are developmentally important in Caenorhabditis elegans were 
not present in Romanomermis culicivorax or Trichinella spiralis. R. culicivorax and T. spiralis 
orthologues of the 1,725 genes identified as important in embryogenesis in an analysis of gene expression in 
Caenorhabditis species [49] were sought. For each embryonic stage (1-10) in C. elegans we calculated the 
proportion of these genes that were apparently unique to the genus Caenorhabditis. 
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Figure 5 




Figure 5: In situ hybridisation revealing the pattern of distribution of mex-3 mRNA in early 
embryos of Romanomermis culicivorax. We used the R. culicivorax mex-3 gene to prove application 
of the in situ technique in this species and investigate the patterns of segregation of this maternal RNA 
in early development. The R. culicivorax mex-3 expression pattern is similar to that of C. elegans [62] . 
R. culicivorax embryos contain dark pigment granules that are asymmetrically segregated in development. 
(A) At the 2-cell stage, maternal mex-3 mRNA is detected in the SI blastomere. The cytoplasmic pigment 
granules are predominantly in the PI blastomere. (B) At the 4-cell stage, mex-3 mRNA is detected in 
daughters of anterior SI cell. Cytoplasmic pigment granules are predominantly in the S2 blastomere. (C) 
At a later stage (>20 cells), mex-3 mRNA is absent. The pigment granules are found in descendants of 
S2 (S2d). (D) During early morphogenesis, the pigment granules are found in S2 descendants forming 
hypodermis, (S2d, hyp). (A-C) fixed embryos; (D) live embryo. Bar 10 /jm. Orientation: anterior left. 
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Tables 

Table 1- Assembly and annotation statistics 



Metric 


Result 


Contigs >100bp span 


267,342,457bp 


Scaffolds >500bp span 


322,765, 761bp 


Num. contigs/scaffolds 


62,537 


N50 contigs/scaffolds >500bp 


17,632 bp 


N50 scaffolds >500bp 


29,995bp 


Max contig length 


28,847bp 


Max scaffold length 


201,054bp 


Mean transcript length 


593bp 


Mean protein length 


190aa 


MAKER Augustus predictions 


12,026 proteins 


MAKER SNAP predictions 


36,145 proteins 


Num. ESTs (isogroups) 


22,418 ESTs 


Mean EST length 


330bp 


80% BLAT sequence coverage 


21,204 ESTs 


CEGMA compl. completeness 


75.40% 


CEGMA Group 1 part, compl. 


81.82% 


CEGMA Group 2 part, compl. 


91.07% 


CEGMA Group 3 part, compl. 


91.80% 




95.38% 




Table 2 - Genome statistics 

Repeat content of different nematode genomes appears not to be directly correlated with 
genome size. Re-calculation in selected genomes shows little deviance from published data 
(in parentheses)* and thus indicates the validity of our inference for R. culicivorax. 

*For B. xylophilus and M. incognita only reference data is given as the same programs were used for initial inference 
(see references); A. suum not re-calculated. 



31 



Species 


Approximate* Estimated 


Median''' 


Median''' 


GC 


Source 




genome size 


Repeat content 


exon length 


intron length 


content 






C. elegans 


100Mb 


17% (16.5%) 


145bp 


69bp 


38% 


In 


18 


P. pacificus 


165Mbp 


15.3% (17%) 


85bp 


141bp 


42% 


20, 


25 


A. suum 


334Mb 


4.4% 


144bp 


907bp 


37.9% 




129 


B. malayi 


95Mb 


16.5% (15%) 


140bp 


219bp 


30% 


|22 




B. xylophilus 


69Mb 


22,5% 


183bp 


69bp 


40% 


[25 




M. incognita 


-200Mb 


36,7% 


136bp 


82bp 


31% 


[24 




T. spiralis 


63Mb 


19.8% (18%) 


128bp 


283bp 


34% 


m 




R. culicivorax 


> 270Mb 


48.2% 


161bp 


405bp 


36% 


this work 



*M. incognita genome size given as 86Mbp in [24] has been re-estimated to about 200Mbp (E. Danchin pers. 
comm.). 

t Median lengths for A. suum and T. spiralis were calculated in this work as these data are not given in the cited 
publications. 



Table 3 Presence and absence of selected C. elegans proteins in Dorylaimia 
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T. spiralis 
Early asymmetry 



CDC-42 


+ 


+ 


PKC-3 


+ 


+ 


GPR-1 


+ 


+ 


GPR-2 


+ 


+ 


PAR-6 


+ 


+ 


MES-6 


+ 


+ 


MES-3 






MES-4 






GFL-1 


+ 


+ 


LET-70 


+ 


+ 


Axis formation 


NUM-1 


+ 


+ 


ZIM-1 






MES-2 






POS-1 






SMA-6 


+ 


+ 


SET-2 






UBC-18 


+ 


+ 


LET-99 






OOC-3 






OOC-5 


+ 


+ 


GPA-16 


+ 


+ 


PAR-5 






ATX-2 






MEX-5 






MEX-6 






UNC-120 






NOS-2 






OMA-1 






RME-2 


+ 


+ 









Sex determination 

XOL-1 - 
HER-1 

SEX-1 + 
FOX-1 + 
SDC-1 
SDC-2 
SDC-3 
TRA-2 

FEM-1 + 
FEM-2 + 



Hypodermis and 



AFF-1 






BAR-1 






CEH-2 






CEH-27 






GRL-15 






INX-5 






LIN-1 






PEB-1 






ELT-3 






ELT-1 


+ 


+ 


SMA-3 






SMA-5 








formation 



Supplementary Files 

These will be available through the main author upon personal request in the preprint phase. 
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Supplementary file 1 — Supplementary Figures and Tables 
Supplementary file 2 — Fisher's exact test data 

GO terms enriched in a set of protein clusters shared between Dorylaimia in comparison to (i) C. elegans 
and (ii) T. castaneum proteomes. 

Supplementary file 3 — Codon usage in R. culicivorax 

Codon usage data. 



Supplementary file 4 — Levin data 



1 



Genes identified as being differentially expressed in Caenorhabditis development by Levin et al. 49 
Supplementary file 5 — Analysis of orthoMCL output by BLAST+ 



BLAST+ results for specific C. elegans proteins not found in a cluster with Dorylaimia proteins. 
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